Overview

Brought to you by YData

Dataset statistics

Number of variables5
Number of observations6040
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory236.1 KiB
Average record size in memory40.0 B

Variable types

Numeric4
Categorical1

Alerts

Zip-code is highly skewed (γ1 = 77.57032144) Skewed
UserID is uniformly distributed Uniform
UserID has unique values Unique
Occupation has 711 (11.8%) zeros Zeros

Reproduction

Analysis started2025-07-25 17:17:20.377665
Analysis finished2025-07-25 17:18:53.958350
Duration1 minute and 33.58 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

UserID
Real number (ℝ)

Uniform  Unique 

Distinct6040
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3020.5
Minimum1
Maximum6040
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size47.3 KiB
2025-07-25T19:18:54.040106image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile302.95
Q11510.75
median3020.5
Q34530.25
95-th percentile5738.05
Maximum6040
Range6039
Interquartile range (IQR)3019.5

Descriptive statistics

Standard deviation1743.7421
Coefficient of variation (CV)0.57730248
Kurtosis-1.2
Mean3020.5
Median Absolute Deviation (MAD)1510
Skewness0
Sum18243820
Variance3040636.7
MonotonicityStrictly increasing
2025-07-25T19:18:54.172673image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6040 1
 
< 0.1%
1 1
 
< 0.1%
2 1
 
< 0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%
5 1
 
< 0.1%
6 1
 
< 0.1%
7 1
 
< 0.1%
6024 1
 
< 0.1%
6023 1
 
< 0.1%
Other values (6030) 6030
99.8%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
6040 1
< 0.1%
6039 1
< 0.1%
6038 1
< 0.1%
6037 1
< 0.1%
6036 1
< 0.1%
6035 1
< 0.1%
6034 1
< 0.1%
6033 1
< 0.1%
6032 1
< 0.1%
6031 1
< 0.1%

Gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size47.3 KiB
M
4331 
F
1709 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6040
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowM
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
M 4331
71.7%
F 1709
 
28.3%

Length

2025-07-25T19:18:54.291548image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-25T19:18:54.357974image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
m 4331
71.7%
f 1709
 
28.3%

Most occurring characters

ValueCountFrequency (%)
M 4331
71.7%
F 1709
 
28.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 6040
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 4331
71.7%
F 1709
 
28.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 6040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 4331
71.7%
F 1709
 
28.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M 4331
71.7%
F 1709
 
28.3%

Age
Real number (ℝ)

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.639238
Minimum1
Maximum56
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size47.3 KiB
2025-07-25T19:18:54.412495image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile18
Q125
median25
Q335
95-th percentile56
Maximum56
Range55
Interquartile range (IQR)10

Descriptive statistics

Standard deviation12.895962
Coefficient of variation (CV)0.42089694
Kurtosis-0.29081008
Mean30.639238
Median Absolute Deviation (MAD)7
Skewness0.24270008
Sum185061
Variance166.30583
MonotonicityNot monotonic
2025-07-25T19:18:54.477030image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
25 2096
34.7%
35 1193
19.8%
18 1103
18.3%
45 550
 
9.1%
50 496
 
8.2%
56 380
 
6.3%
1 222
 
3.7%
ValueCountFrequency (%)
1 222
 
3.7%
18 1103
18.3%
25 2096
34.7%
35 1193
19.8%
45 550
 
9.1%
50 496
 
8.2%
56 380
 
6.3%
ValueCountFrequency (%)
56 380
 
6.3%
50 496
 
8.2%
45 550
 
9.1%
35 1193
19.8%
25 2096
34.7%
18 1103
18.3%
1 222
 
3.7%

Occupation
Real number (ℝ)

Zeros 

Distinct21
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.1468543
Minimum0
Maximum20
Zeros711
Zeros (%)11.8%
Negative0
Negative (%)0.0%
Memory size47.3 KiB
2025-07-25T19:18:54.552022image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median7
Q314
95-th percentile19
Maximum20
Range20
Interquartile range (IQR)11

Descriptive statistics

Standard deviation6.3295115
Coefficient of variation (CV)0.77692705
Kurtosis-1.2141444
Mean8.1468543
Median Absolute Deviation (MAD)5
Skewness0.33829811
Sum49207
Variance40.062716
MonotonicityNot monotonic
2025-07-25T19:18:54.641536image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
4 759
12.6%
0 711
11.8%
7 679
11.2%
1 528
 
8.7%
17 502
 
8.3%
12 388
 
6.4%
14 302
 
5.0%
20 281
 
4.7%
2 267
 
4.4%
16 241
 
4.0%
Other values (11) 1382
22.9%
ValueCountFrequency (%)
0 711
11.8%
1 528
8.7%
2 267
 
4.4%
3 173
 
2.9%
4 759
12.6%
5 112
 
1.9%
6 236
 
3.9%
7 679
11.2%
8 17
 
0.3%
9 92
 
1.5%
ValueCountFrequency (%)
20 281
4.7%
19 72
 
1.2%
18 70
 
1.2%
17 502
8.3%
16 241
4.0%
15 144
 
2.4%
14 302
5.0%
13 142
 
2.4%
12 388
6.4%
11 129
 
2.1%

Zip-code
Real number (ℝ)

Skewed 

Distinct3403
Distinct (%)56.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87590.829
Minimum231
Maximum1.9312204 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size47.3 KiB
2025-07-25T19:18:54.747049image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum231
5-th percentile2714.95
Q122314
median55107
Q389110
95-th percentile97214
Maximum1.9312204 × 108
Range1.9312181 × 108
Interquartile range (IQR)66796

Descriptive statistics

Standard deviation2485802.4
Coefficient of variation (CV)28.37971
Kurtosis6024.5333
Mean87590.829
Median Absolute Deviation (MAD)32900
Skewness77.570321
Sum5.2904861 × 108
Variance6.1792134 × 1012
MonotonicityNot monotonic
2025-07-25T19:18:54.870288image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
48104 19
 
0.3%
22903 18
 
0.3%
94110 17
 
0.3%
55104 17
 
0.3%
10025 16
 
0.3%
55455 16
 
0.3%
55105 16
 
0.3%
48103 15
 
0.2%
94114 15
 
0.2%
55408 15
 
0.2%
Other values (3393) 5876
97.3%
ValueCountFrequency (%)
231 1
 
< 0.1%
606 1
 
< 0.1%
681 1
 
< 0.1%
693 1
 
< 0.1%
918 1
 
< 0.1%
926 1
 
< 0.1%
961 1
 
< 0.1%
1002 5
0.1%
1003 1
 
< 0.1%
1020 1
 
< 0.1%
ValueCountFrequency (%)
193122042 1
< 0.1%
5849574 1
< 0.1%
2020010 1
< 0.1%
970025 1
< 0.1%
956456 1
< 0.1%
954025 1
< 0.1%
949702 1
< 0.1%
495321 2
< 0.1%
444555 1
< 0.1%
400060 1
< 0.1%

Interactions

2025-07-25T19:18:05.274983image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:20.763618image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:40.372545image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:54.092295image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:18:16.809014image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:20.891282image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:40.589843image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:54.199876image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:18:27.394043image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:21.006666image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:40.763527image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:54.305895image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:18:36.480596image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:21.169289image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:41.010830image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-25T19:17:54.427852image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

2025-07-25T19:18:54.965801image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
AgeGenderOccupationUserIDZip-code
Age1.0000.0470.0790.012-0.010
Gender0.0471.0000.2400.0630.000
Occupation0.0790.2401.000-0.0160.042
UserID0.0120.063-0.0161.000-0.060
Zip-code-0.0100.0000.042-0.0601.000

Missing values

2025-07-25T19:18:53.817398image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-25T19:18:53.894267image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

UserIDGenderAgeOccupationZip-code
01F11048067
12M561670072
23M251555117
34M45702460
45M252055455
56F50955117
67M35106810
78M251211413
89M251761614
910F35195370
UserIDGenderAgeOccupationZip-code
60306031F18045123
60316032M45755108
60326033M501378232
60336034M251494117
60346035F25178734
60356036F251532603
60366037F45176006
60376038F56114706
60386039F45001060
60396040M25611106